Search CORE

7 research outputs found

Clustering using Vector Membership: An Extension of the Fuzzy C-Means Algorithm

Author: Bose Digbalay
Ganguly Srinjoy
Konar Amit
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/12/2013
Field of study

Clustering is an important facet of explorative data mining and finds extensive use in several fields. In this paper, we propose an extension of the classical Fuzzy C-Means clustering algorithm. The proposed algorithm, abbreviated as VFC, adopts a multi-dimensional membership vector for each data point instead of the traditional, scalar membership value defined in the original algorithm. The membership vector for each point is obtained by considering each feature of that point separately and obtaining individual membership values for the same. We also propose an algorithm to efficiently allocate the initial cluster centers close to the actual centers, so as to facilitate rapid convergence. Further, we propose a scheme to achieve crisp clustering using the VFC algorithm. The proposed, novel clustering scheme has been tested on two standard data sets in order to analyze its performance. We also examine the efficacy of the proposed scheme by analyzing its performance on image segmentation examples and comparing it with the classical Fuzzy C-means clustering algorithm.Comment: 6 pages, 8 figures and 1 table (Conference Paper

arXiv.org e-Print Archive

Crossref

Signal Processing Grand Challenge 2023 -- e-Prevention: Sleep Behavior as an Indicator of Relapses in Psychotic Patients

Author: Adsul Kranti
Avramidis Kleanthis
Bose Digbalay
Narayanan Shrikanth
Publication venue
Publication date: 17/04/2023
Field of study

This paper presents the approach and results of USC SAIL's submission to the Signal Processing Grand Challenge 2023 - e-Prevention (Task 2), on detecting relapses in psychotic patients. Relapse prediction has proven to be challenging, primarily due to the heterogeneity of symptoms and responses to treatment between individuals. We address these challenges by investigating the use of sleep behavior features to estimate relapse days as outliers in an unsupervised machine learning setting. We extract informative features from human activity and heart rate data collected in the wild, and evaluate various combinations of feature types and time resolutions. We found that short-time sleep behavior features outperformed their awake counterparts and larger time intervals. Our submission was ranked 3rd in the Task's official leaderboard, demonstrating the potential of such features as an objective and non-invasive predictor of psychotic relapses.Comment: 2 pages, 1 table, ICASSP 2023, Grand Challenges Trac

arXiv.org e-Print Archive

Unlocking Foundation Models for Privacy-Enhancing Speech Understanding: An Early Study on Low Resource Speech Training Leveraging Label-guided Synthetic Speech Content

Author: Bose Digbalay
Feng Tiantian
Narayanan Shrikanth
Shi Xuan
Publication venue
Publication date: 13/06/2023
Field of study

Automatic Speech Understanding (ASU) leverages the power of deep learning models for accurate interpretation of human speech, leading to a wide range of speech applications that enrich the human experience. However, training a robust ASU model requires the curation of a large number of speech samples, creating risks for privacy breaches. In this work, we investigate using foundation models to assist privacy-enhancing speech computing. Unlike conventional works focusing primarily on data perturbation or distributed algorithms, our work studies the possibilities of using pre-trained generative models to synthesize speech content as training data with just label guidance. We show that zero-shot learning with training label-guided synthetic speech content remains a challenging task. On the other hand, our results demonstrate that the model trained with synthetic speech samples provides an effective initialization point for low-resource ASU training. This result reveals the potential to enhance privacy by reducing user data collection but using label-guided synthetic speech content

arXiv.org e-Print Archive

A dataset for Audio-Visual Sound Event Detection in Movies

Author: Bose Digbalay
Hebbar Rajat
Narayanan Shrikanth
Somandepalli Krishna
Vijai Veena
Publication venue
Publication date: 14/02/2023
Field of study

Audio event detection is a widely studied audio processing task, with applications ranging from self-driving cars to healthcare. In-the-wild datasets such as Audioset have propelled research in this field. However, many efforts typically involve manual annotation and verification, which is expensive to perform at scale. Movies depict various real-life and fictional scenarios which makes them a rich resource for mining a wide-range of audio events. In this work, we present a dataset of audio events called Subtitle-Aligned Movie Sounds (SAM-S). We use publicly-available closed-caption transcripts to automatically mine over 110K audio events from 430 movies. We identify three dimensions to categorize audio events: sound, source, quality, and present the steps involved to produce a final taxonomy of 245 sounds. We discuss the choices involved in generating the taxonomy, and also highlight the human-centered nature of sounds in our dataset. We establish a baseline performance for audio-only sound classification of 34.76% mean average precision and show that incorporating visual information can further improve the performance by about 5%. Data and code are made available for research at https://github.com/usc-sail/mica-subtitle-aligned-movie-sound

arXiv.org e-Print Archive

Does Video Summarization Require Videos? Quantifying the Effectiveness of Language in Video Summarization

Author: Bose Digbalay
Lehavi Adam
Nam Yoonsoo
Narayanan Shrikanth
Swayamdipta Swabha
Yang Daniel
Publication venue
Publication date: 17/09/2023
Field of study

Video summarization remains a huge challenge in computer vision due to the size of the input videos to be summarized. We propose an efficient, language-only video summarizer that achieves competitive accuracy with high data efficiency. Using only textual captions obtained via a zero-shot approach, we train a language transformer model and forego image representations. This method allows us to perform filtration amongst the representative text vectors and condense the sequence. With our approach, we gain explainability with natural language that comes easily for human interpretation and textual summaries of the videos. An ablation study that focuses on modality and data compression shows that leveraging text modality only effectively reduces input data processing while retaining comparable results.Comment: \c{opyright} 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other work

arXiv.org e-Print Archive

Automatic Analysis of Asymmetry in Facial Paralysis Patients Using Landmark-Based Measures.

Author: Bose Digbalay
Kochhar Amit
Narayanan Shrikanth
Somandepalli Krishna
Tai Tymon
Voelker Courtney
Publication venue: Providence St. Joseph Health Digital Commons
Publication date: 10/03/2022
Field of study

Providence St. Joseph Health Digital Commons